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Despite centuries of close association, statistics and astronomy are surprisingly distant today. Most observational 
astronomical research relies on an inadequate toolbox of methodological tools. Yet the needs are substantial: 
astronomy encounters sophisticated problems involving sampling theory, survival analysis, multivariate classifi- 
cation and analysis, time series analysis, wavelet analysis, spatial point processes, nonlinear regression, bootstrap 
resampling and model selection. We review the recent resurgence of astrostatistical research, and outline new 
challenges raised by the emerging Virtual Observatory. Our essay ends with a list of research challenges and 
infrastructure for astrostatistics in the coming decade. 



1. The glorious history of astronomy and 
statistics 



Astronomy is perhaps the oldest observational sci- 
ence^. The effort to understand the mysterious lumi- 
nous objects in the sky has been an important element 
of human culture for at least 10^ years. Quantitative 
measurements of celestial phenomena were carried out 
by many ancient civilizations. The classical Greeks 
were not active observers but were unusually creative 
in the applications of mathematical principles to as- 
tronomy. The geometric models of the Platonists with 
crystalline spheres spinning around the static Earth 
were elaborated in detail, and this model endured in 
Europe for 15 centuries. But it was another Greek 
natural philosopher, Hipparchus, who made one of the 
first applications of mathematical principles that we 
now consider to be in the realm of statistics. Finding 
scatter in Bablylonian measurements of the length of 
a year, defined as the time between solstices, he took 
the middle of the range - rather than the mean or 
median - for the best value. 

This is but one of many discussions of statistical is- 
sues in the history of astronomy. Ptolemy estimated 
parameters of a non-linear cosmological model using a 
minimax goodness-of-fit method. Al-Biruni discussed 
the dangers of propagating errors from inaccurate in- 
struments and inattentive observers. While some Me- 
dieval scholars advised against the acquisition of re- 
peated measurements, fearing that errors would com- 
pound rather than compensate for each other, the use- 
fulnes of the mean to increase precision was demon- 
strated with great success by Tycho Brahe. 

During the 19th century, several elements of modern 
mathematical statistics were developed in the context 



'^The historical relationship between astronomy and statis- 
tics is described in references [15], [38] and elsewhere. Our 
Astrostatistics monograph gives more detail and contemporary 
examples of astrostatistical problems [3]. 



of celestial mechanics, where the application of New- 
tonian theory to solar system phenomena gave aston- 
ishingly precise and self-consistent quantitative infer- 
ences. Legendre developed L2 least squares parame- 
ter estimation to model cometary orbits. The least- 
squares method became an instant success in Euro- 
pean astronomy and geodesy. Other astronomers and 
physicists contributed to statistics: Huygens wrote a 
book on probability in games of chance; Newton devel- 
oped an interpolation procedure; Halley laid founda- 
tions of actuarial science; Quetelet worked on statisti- 
cal approaches to social sciences; Bessel first used the 
concept of " probable error" ; and Airy wrote a volume 
on the theory of errors. 

But the two fields diverged in the late- 19th and 20th 
centuries. Astronomy leaped onto the advances of 
physics - electromagnetism, thermodynamics, quan- 
tum mechanics and general relativity - to understand 
the physical nature of stars, galaxies and the Universe 
as a whole. A subfield called "statistical astronomy" 
was still present but concentrated on rather narrow is- 
sues involving star counts and Galactic structure [30l | . 
Statistics concentrated on analytical approaches. It 
found its principle applications in social sciences, bio- 
metrical sciences and in practical industries {e.g., Sir 
R. A. Fisher's employment by the British agricultural 
service) . 



2. Statistical needs of astronomy today 

Contemporary astronomy abounds in questions of 
a statistical nature. In addition to exploratory data 
analysis and simple heuristic (usually linear) modeling 
common in other fields, astronomers also often inter- 
pret data in terms of complicated non-linear models 
based on deterministic astrophysical processes. The 
phenomena studied must obey known behaviors of 
atomic and nuclear physics, gravitation and mechan- 
ics, thermodynamics and radiative processes, and so 
forth. 'Modeling' data may thus involves both the se- 
lection of a model family based on an astrophysical 
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understanding of the conditions under study, and a 
statistical effort to find parameters for the specified 
modeL A wide variety of issues thus arise: 

• Does an observed group of stars (or galaxies or 
molecular clouds or 7-ray sources) constitute a 
typical and unbiased sample of the vast under- 
lying population of similar objects? 

• When and how should we divide/classify these 
objects into 2, 3 or more subclasses? 

• What is the intrinsic physical relationship be- 
tween two or more properties of a class of ob- 
jects, especially when confounding variables or 
observational selection effects are present? 

• How do we answer such questions in the presence 
of observations with measurements errors and 
flux limits? 

• When is a blip in a spectrum (or image or time 
series) a real signal rather than a random event 
from Gaussian (or often Poissonian) noise or 
confounding variables? 

• How do we interpret the vast range of tem- 
porally variable objects: periodic signals from 
rotating stars or orbiting extrasolar planets, 
stochastic signals from accreting neutron stars 
or black holes, explosive signals from magnetic 
reconnection flares or 7-ray bursts? 

• How do we model the points in 2, 3, 6- 
dimensional points representing photons in an 
image, galaxies in the Universe, Galactic stars 
in phase space? 

• How do we quantify continuous structures seen 
in the sky such as the cosmic microwave 
background, the interstellar and intergalactic 
gaseous media? 

• How do we flt astronomical spectra to highly 
non-linear astrophysical models based on atomic 
physics and radiative processes, including confl- 
dence limits on the best-fit parameters? 

From a superficial examination of the astronomical 
literature^, we can show that such questions are very 
common today. Of ~ 15, 000 refereed papers pub- 
lished annually, 1% have "statistics" or "statistical" 
in their title, 5% have "statistics' in their abstract, 
10% treat time- variable objects, 5 — 10% (est.) present 



^Such bibliometric measures are easily accom- 
plished as the entire astronomical research literature 
is on-line (in full text at subscribing institutions) 
through the NASA-supported Astrophysics Data System, 
|http: / / adsabs.harvard.edu/abstract_service.html, 



or analyze multivariate datasets, and 5 — 10% (est.) 
fit parametric models. Accounting for overlaps, we 
roughly estimate that around ~ 3, 000 distinct studies 
each year require non-trivial statistical methodologies. 
Roughly 10% of these are principally involved with 
statistical methods; indeed, some of these purport to 
develop new methods or improve on established ones. 



3. Astrostatistics today 

We thus find that astronomy and astrophysics today 
requires a vast range of statistical capabilities. In sta- 
tistical jargon, it helps for astronomers to know some- 
thing about: sampling theory, survival analysis with 
censoring and truncation, measurement error mod- 
els, multivariate classification and analysis, harmonic 
and autoregressive time series analysis, wavelet anal- 
ysis, spatial point processes and continuous surfaces, 
density estimation, linear and non-linear regression, 
model selection, and bootstrap resampling. In some 
cases, astronomers need combinations of methodolo- 
gies that have not yet been fully developed (§7 below). 

Faced with such a complex of challenges, mechani- 
cal exposure to a wider variety of techniques is a nec- 
essary but not sufficient prerequisite for high-quality 
statistical analyses. Astronomers also need to be 
imbued with established principles of statistical in- 
ference; e.g., hypothesis testing and parameter es- 
timation, nonparametric and parametric inference, 
Bayesian and frequentist approaches, and the assump- 
tions underlying and applicability conditions for any 
given statistical method. 

Unfortunately, we find that the majority of the 
thousands of astronomical studies requiring statisti- 
cal analyses use a very limited set of classical meth- 
ods. The most common tools used by astronomers 
are: Fourier transforms for temporal analysis (de- 
veloped by Fourier in 1807), least squares regression 
and goodness-of-fit (Legendre in 1805, Pearson in 
1900, Fisher in 1924), the nonparametric Kolmogorov- 
Smirnov 1- and 2-sample nonparametric tests (Kol- 
mogorov in 1933), and principal components analysis 
for multivariate tables (Hotelling in 1936). 

Even traditional methods are often misused. Feigel- 
son & Babu [9] found that astronomers use inter- 
changeably up to 6 different fits for bivariate linear 
least squares regression: ordinary least squares (OLS), 
inverse regression, orthogonal regression, major axis 
regression, the OLS mean, and the OLS bisector. Not 
only did this lead to confusion in comparing studies 
{e.g., in measuring the expansion of the Universe via 
Bubble's constant. Ho), but astronomers did not real- 
ize that the confidence intervals on the fitted parame- 
ters can not be correctly estimated with standard ana- 
lytical formulae. Similarly, Protassov et al. [24] found 
that the majority of astronomical applications of the 
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F test, or more generally the likelihood ratio test, are 
inconsistent with asymptotic statistical theory. 

But, while the average astronomical study is lim- 
ited to often-improper usage of a limited repertoire 
of statistical methods, a significant tail of outliers are 
much more sophisticated. The maximization of like- 
lihoods, often developed specially for the problem at 
hand, is perhaps the most common of these improve- 
ments. Bayesian approaches are also becoming in- 
creasingly in vogue. 

In a number of cases, sometimes buried in techni- 
cal appendices of observational papers, astronomers 
independently develop statistical methods. Some of 
these are rediscoveries of known procedures; for exam- 
ple, Avni et al. [2] and others recovered elements of 
survival analysis for treatments of left-censored data 
arising from nondetections of known objects. Some 
are quite possibly mathematically incorrect; such as 
various revisions to for Poissonian data that as- 
sume the resulting statistic still follows the dis- 
tribution. On rare occasions, truly new and correct 
methods have emerged; for example, astrophysicist 
Lynden-Bell [19] discovered the maximum-likelihood 
estimator for a randomly truncated dataset, for which 
the theoretical validity was later established by statis- 
tician Woodroofe [31]. 

A growing group of astronomers, recognizing the 
potential for new liaisons with the accomplishments of 
modern statistics, have promoted astrostatistical in- 
novation through cross-disciplinary meetings and col- 
laborations. Fionn Murtagh, an applied mathemati- 
cian at Queen's University (Belfast) with long expe- 
rience in astronomy, and his colleagues have run con- 
ferences and authored many useful monographs {e.g., 
[16], [17], [22] and [27]). We at Penn State have run a 
series of Statistical Challenges in Modern Astronomy 
meetings with both communities in attendance {e.g., 
[3] and [10]). Alanna Connors has organized brief 
statistics sessions at large astronomy meetings, and we 
have organized brief astronomy sessions at large Joint 
Statistical meetings. We wrote a short volume called 
Astrostatistics [3] intended to familiarize scholars in 
one discipline with relevant issues in the other disci- 
pline. Other series conferences are devoted to techni- 
cal issues in astronomical data analysis but typically 
have limited participation by statisticians. These in- 
clude the dozen Astronomical Data Analysis Software 
and Systems {e.g., [23]), several Erice workshops on 
Data Analysis in Astronomy {e.g., [8]), and the new 
SPIE Astronomical Data Analysis conferences {e.g., 
[26]). 

Most importantly, several powerful astrostatistical 
research collaborations have emerged. At Harvard 
University and the Smithsonian Astrophysical Ob- 
servatory, David van Dyk worked with scientists at 



the Chandra^ X-ray Center on several issues, par- 
ticularly Bayesian approaches to parametric model- 
ing of spectra in light of complicated instrumental ef- 
fects. At Carnegie Mellon University and the Uni- 
versity of Pittsburgh, the Pittsburgh Computational 
Astrophysics group addressed several issues, such as 
developing powerful techniques for multivariate classi- 
fication of extremely large datasets and applying non- 
parametric regression methods to cosmology. Both 
of these groups involved academics, researchers and 
graduate students from both fields working closely 
for several years to achieve a critical mass of cross- 
disciplinary capabilities. 

Other astrostatistical collaborations must be men- 
tioned. David Donoho (Statistics at Stanford Uni- 
versity) works with Jeffrey Scargle (NASA Ames 
Research Center) and others on applying advanced 
wavelet methods to astronomical problems. James 
Berger (Statistics at Duke University) has worked 
with astronomers William Jefferys (University of 
Texas), Thomas Loredo (Cornell University), and 
Alanna Connors (Eureka Inc.) on Bayesian method- 
ologies for astronomy. Bradley Efron (Statistics at 
Stanford University) has worked with astrophysicist 
Vehe Petrosian (also at Stanford) on survival meth- 
ods for interpreting 7-ray bursts. Philip Stark (Statis- 
tics at University of California, Berkeley) has collabo- 
rated with solar physicists in the GONG program to 
improve analysis of oscillations of the Sun (helioseis- 
mology). More such collaborations exist in the U.S., 
Europe and elsewhere. 

4. The Virtual Observatory: A new 
imperative for astrostatistics 

A major new trend is emerging in observational as- 
tronomy with the production of huge, uniform, mul- 
tivariate databases from specialized survey projects 
and telescopes'*. But they are heterogeneous in char- 
acter, reside at widely dispersed locations, and ac- 
cessed through different database systems. Examples 



^The Chandra X-ray Observatory is one of NASA's Great 
Observatories. It was launched in 1999 with a total budget 
around $2 billion. 

^An enormous collection of catalogs, and some of the un- 
derlying imaging and spectral databases, are already avail- 
able on-line. Access to many catalogs is provided by Vizier 
( http: / /vizier. u-strasbg.fr I. The NASA Extragalactic Database 
(NED, http://ned.ipac.caltech.edul, SIMBAD stellar database 
(http://simbad.u-strasbg.frl, and ADS (footnote 2) give in- 
tegrated access to many catalogs and bibliographic informa- 
tion. Raw data are available from all U.S. space-based obser- 
vatories; see, for example, the Multi-mission Archive at Space 
Telescope (MAST, http://archive.stsci.edul and High Energy 
Astrophysics Science Archive Research Center (HEASARC, 
|http: / /heasarc. gsfc.na sa.gov I. 
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include: 

1. 10® — 10^-object catalogs of stars and stellar 
extragalactic objects (i.e., quasars). These in- 
clude the all-sky photographic optical USNO-Bl 
catalog, the all-sky near-infrared 2MASS cata- 
log, and the wide-field Sloan Digital Sky Survey 
(SDSS). Five to ten photometric values, each 
with measured heteroscedastic measurement er- 
rors (i.e., different for each data point), are 
available for each object. 

2. 10^ — 10^-galaxy redshift catalogs from the 2- 
degree Field (2dF) and SDSS spectroscopic sur- 
veys. The main goal is characterization of the 
hierarchical, nonlinear and anisotropic cluster- 
ing of galaxies in a 3-dimensional space. But 
the datasets also include spectra for each galaxy 
each with 10'^ independent measurements. 

3. 10^ — 10^-source catalogs from various multi- 
wavelength wide-field surveys such as the NRAO 
Very Large Array Sky Survey in one radio 
band, the InfraRed Astronomical Satellite Faint 
Source catalog in four infrared bands, the Hip- 
parcos and Tycho catalogs of star distances and 
motions, and the X-ray Multimirror Mission 
Serendipitous Source Catalogue in several X-ray 
bands now in progress. These catalogs are typi- 
cally accompanied by large image libraries. 

4. 10^ — 10^-object samples of well-characterized 
pre-main sequence stars, binary stars, variable 
stars, pulsars, interstellar clouds and nebulae, 
nearby galaxies, active galactic nuclei, gamma- 
ray bursts and so forth. There are dozens of 
such samples with typically 10 — 20 catalogued 
properties and often with accompanying 1-, 2- 
or 3-dimensional images or spectra. 

5. Perhaps the most ambitious of such surveys 
is the planned Large-aperture Synoptic Survey 
Telescope (LSST) which will survey much of the 
entire optical sky every few nights. It is ex- 
pected to generate raw databases in excess of 10 
PBy (petabyte) and catalogs with 10^" entries. 

An international effort known as the Virtual Ob- 
servatory (VO) is now underway to coordinate and 
federate these diverse databases, making them read- 
ily accessible to the scientific user 6, 29]. Consider- 
able progress is being made in the establishment of 
the necessary data and metadata infrastructure and 
standards, interoperability issues, data mining, and 
technology demonstration prototype services^. But 



^ See |http://www.ivoa.net| and / jhttp: / /us-vo.org| for entry 
into Virtual Observatory projects. 



scientific discovery requires more than effective recov- 
ery and distribution of information. After the as- 
tronomer obtains the data of interest, tools are needed 
to explore the datasets. How do we identify corre- 
lations and anomalies within the datasets? How do 
we classify the sources to isolate subpopulations of 
astrophysical interest? How do we use the data to 
constrain astrophysical interpretation, which often in- 
volve highly non-linear parametric functions derived 
from fields such as physical cosmology, stellar struc- 
ture or atomic physics? These questions lie under the 
aegis of statistics. 

A particular problem relevant to statistical comput- 
ing is that, while the speed of CPUs and the capac- 
ity of inexpensive hard disks rise rapidly, computer 
memory capacities grow at a slower pace. Combining 
the largest optical/near-infrared object catalogs to- 
day produces a table with > 1 billion objects and up 
to a dozen columns of photometric data. Such large 
datasets effectively preclude use of all standard mul- 
tivariate statistical packages and visualization tools 
[e.g., R and GGobi) which are generally designed 
to place the entire database into computer memory. 
Even sorting the data to produce quantiles may be 
computational infeasible. 

The Virtual Observatory of the 21st century thus 
presents new challenges to statistical capability in two 
ways. First, some new methodological developments 
are needed (fJSl). Second, efficient access to both new 
and well-established statistical methods are needed. 
No single existing software package can provide the 
vast range of needed methods. We are now involved in 
developing a prototype system called VOStat to pro- 
vide statistical capabilities to the VO astronomer. It 
is based on concepts of Web services and distributed 
Grid computing. Here, the statistical software and 
computational resources, as well as the underlying em- 
pirical databases, may have heterogeneous structures 
and can reside at distant locations. 



5. Some grand methodological 
challenges for the coming decade 

While it is risky to prognosticate the directions of 
future research, and judgments will always differ re- 
garding the relative importance of research goals, we 
can outline a few "grand challenges" for astrostatisti- 
cal research for the next decade or two. 

5.1. Multivariate analysis with 
measurement errors and censoring 

Traditional multivariate analysis is designed mainly 
for applications in the social and human sciences 
where the sources of variance are largely unknowable. 
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Measurement errors are usually ignored, or are con- 
sidered to be exogenous variables in the parametric 
models ■ But astrophysicists often devote as much 
effort to precise determination of their errors as they 
devote to the measurements of the quantities of in- 
terest. The instruments are carefully calibrated to 
reduce systematic uncertainties, and background lev- 
els and random fluctuations are carefully evaluated to 
determine random errors. Except in the simple case 
of bivariate regression 0, 0, I3i this information on 
measurement errors is usually squandered. 

While heteroscedastic measurement errors with 
known variances is common in all physical sciences, 
only astronomy frequently has nondetections when ob- 
servations are made at new wavelengths of known ob- 
jects. These are datapoints where the signal lies be- 
low (say) 3 times the noise level. Here again, mod- 
ern statistics has insufficient tools. Survival analysis 
for censored data assumes that the value below which 
the data point must lie is known with infinite preci- 
sion, rather than being generated from a distribution 
of noise. Astronomer Herman Marshall [20] makes an 
interesting attempt to synthesize measurement errors 
and nondetections, but statistician Leon Gleser [14] 
argues that he has only recovered Fisher's failed the- 
ory of fiducial distributions. Addressing this issue in 
a self-consistent statistical theory is a profound chal- 
lenges that lies at the heart of interpreting the data 
astronomers obtain at the telescope. 

5.2. Statistical inference and 
visualization with very-large-N datasets 

The need for computational software for extremely 
large databases - multi-terabyte image and spectrum 
libraries and multi-billion object catalogs ~ is dis- 
cussed in section 4. A suite of approximate methods 
based on flowing data streams or adaptive sampling of 
large datasets resident on hard disks should be sought. 
Visualization methods involving smoothing, multidi- 
mensional shading and variable transparency, should 
be brought into the astronomer's toolbox. Here, con- 
siderable work is being conducted by computer sci- 
entists and applied mathematicians in other applied 
fields so that independent development by astrostatis- 
ticians might not be necessary to achieve certain goals. 

5.3. A cookbook for construction of 
likelihoods and Bayesian computation 

While the concepts of likelihoods and their applica- 
tions in maximum likelihood estimation, Bayes Theo- 
rem and Bayes factors are becoming increasingly well- 
known in astronomical research, the applications to 
real-life problems is still an art for the expert rather 
than a tool for the masses. Part of the problem is 



conceptual; astronomers need training in how to con- 
struct likelihoods for familiar parametric situations 
{e.g., power law distributions or a Poisson process). 
Part of the problem is computational; astronomers 
need methods and software for the oft-complex com- 
putations. Many such methods, such as Markov chain 
Monte Carlo, are already well-established and can be 
directly adopted for astronomy [13]. For example, as- 
tronomers are often not fully aware of the broad ap- 
plicability of the EM Algorithm for maximizing like- 
lihoods ,21j]6. 

5.4. Links between astrophysical theory 
and wavelets 

Wavelet analysis has become a powerful and sophis- 
ticated tool for the study of features in data. Orig- 
inally intended mainly for modelling time series, as- 
tronomers also use it increasingly for spatial analysis 
of images 0, . In some ways it can be viewed as 
a generalization of Fourier analysis in which the basis 
function need not be sinusoidal in shape and, most 
importantly, the pattern need not extend over the en- 
tire dataset. Wavelets are thus effective in quantita- 
tively describing complicated overlapping structures 
on many scales, and can also be used for signal de- 
noising and compression. In addition, wavelets have 
a strong mathematical foundation. 

Despite its increasing popularity in astronomical 
applications, wavelet analysis suffers a profound lim- 
itation in comparison with Fourier analysis. A peak 
in a Fourier spectrum is immediately interpretable as 
a vibrational, rotational or orbital rotation of solid 
bodies. A bump or a continuum slope in a wavelet 
decomposition often has no analogous physically in- 
tuitive interpretation. We therefore recommend that 
astrophysicists seek links between physical theory - of- 
ten involving continuous media such as turbulent plas- 
mas in the interstellar medium and hierarchical struc- 
ture formation in the early Universe - and wavelets. 
One fascinating example is the demonstration that 
the wavelet spectrum and Lyapunov exponent of the 
quasi-periodic X-ray emission from Sco X-1, which re- 
flects the processes in an accretion disk around a neu- 
tron star, exhibit a transient chaotic behavior similar 
to that of water condensing and dripping onto an au- 
tomobile windshield or a dripping handrail |3^ . 



^The seminal study of the EM Algorithm is Dempster, Laird 
& Rubin in 1977 [7], which is one of the most frequently cited 
papers in statistics. However, the method was independently 
derived three years earlier by astronomer Leon Lucy [18] as an 
"iterative technique for the rectification of observed distribu- 
tions" based on Bayes' Theorem. This study is widely cited in 
the astronomical literature; its most frequent application is in 
image deconvolution where it is known as the Lucy-Richardson 
algorithm. 



MOATOOl 



6 



PhyStat 2003: Statistical Problems in Particle Physics, Astrophysics, and Cosmology 



5.5. Time series models for 
astrophysical phenomena 

The quasi-periodic oscillation of Sco X-1 is only 
one of many examples of complex accretional behav- 
ior onto neutron stars and black holes seen in X-ray 
and 7-ray astronomy. The accreting Galactic black 
hole GRS 1915-1-105 exhibits a bewildering variety of 
distinct states of stochastic, quasi-periodic and explo- 
sive behaviors. The prompt emission from gamma-ray 
bursts show a fantastic diversity of temporal behav- 
iors from simple smooth fast-rise-exponential-decays 
to stochastic spiky profiles. Violent magnetic recon- 
nection flares on the surfaces of the Sun and other 
magnetically active stars also show complex behav- 
iors. Many of these datasets are multivariate with 
time series available in several spectral bands often 
showing lags or hardness ratio variations of astrophys- 
ical interest. 

There are also important astronomical endeavors 
which seek astrophysically interesting signals amidst 
the oft-complex noise characteristics of the detectors. 
The Arecibo, Parkes and VLA radio telescopes, for ex- 
ample, conduct searches for new radio pulsars or for 
extraterrestrial intelligences in nearby planetary sys- 
tems. The Laser Interferometer Gravitational- Wave 
Observatory (LIGO) and related detectors search for 
both continuing periodic signals and brief bursts from 
perturbations in space-time predicted by Einstein's 
General Relativity. Here the signals sought are orders 
of magnitude fainter than instrumental variations. 

6. Infrastructure needed to advance 
astrostatistics 

The current quality of statistical analyses in astro- 
nomical research often begs for improvment. There is 
both inadequate research on important new challenges 
([jSJ and inadequate application of known advanced 
methods to astronomical problems (§3). Astronomy 
clearly needs needs a strong and rapid surge of en- 
ergy in statistical expertise. Three types of activities 
should be promoted: 

Cross-training In the U.S., the typical curricu- 
lum leading to a career in astronomical research 
requires zero or one course in statistics at the 
undergraduate level, and zero at the graduate 
level. Analogously, the curriculum of statisti- 
cians includes virtually no coursework in astron- 
omy or other physical science. While statisti- 
cians can learn basics from "Astronomy 101" 
courses given at all universities, the statistical 
training of astronomers is not as easily accom- 
plished. New curricular products summarizing 
the applicable statistical subfields, short train- 
ing workshops for graduate students and young 



scientists, and effective statistical consulting are 
all needed. 



Increased collaborative research While sev- 
eral astrostatistical research groups are mak- 
ing exciting progress (§3), the total effort is 
too small to impact the bulk of astronomical 
research. Very roughly, astrostatistical fund- 
ing is currently $1M of the $1B spent annually 
on astronomical research. This fraction is far 
below that spent in biomedical or other non- 
physical-science fields. Though top academic 
leaders of statistics have expressed great enthu- 
siasm for astronomy and astrostatistics, we can 
not pull them away from biostatistics and busi- 
ness applications without a major increase in 
funding. We might seek, for example, 10 — 20 
cross-disciplinary research groups active at any 
one time at the end of a decade's growth. 



Statistical software For various policy and cul- 
tural reasons, astronomers rarely purchase the 
large commercial statistical software packages, 
preferring to write their own software as needs 
arise. This approach has contributed to the 
narrow methodological scope of astronomical re- 
search. Avenues for improving this situation are 
emerging, i? is a large statistical software pack- 
age with the flexible command-line interface pre- 
ferred by astronomers that has recently emerged 
(http://www.r-project.org). A wide variety of 
specialized packages and codes are also available 
on-line ( http: / /www. astro. psu.edu/statcodes ) . 
The new Web services concept being developed 
within the context of a Virtual Observatory per- 
mits coordinated access to heterogeneous soft- 
ware developed specifically for astronomical ap- 
plications. 



At Penn State, we are in the early stages of devel- 
oping a Center for Astrostatistics to help attain these 
goals (http:/ /www.astrostatistics.psu.edu). This is an 
inter-disciplinary Center to serve the astronomy and 
statistics communities around the nation and world- 
wide, seeking to bring advances in statistics into the 
toolbox of astronomy and astrophysics. The Center's 
Web site will maintain the popular StatCodes, build 
an instructional library of R programs, coordinate 
with the nascent VOStat Web service, and develop an 
archive of annotated links to selected statistical liter- 
ature applicable to astronomy (and vice versa). The 
site is also planned to include tutorial handbooks and 
curricular products developed specifically for astrosta- 
tistical needs. 
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